Vocabulary expansion through automatic abbreviation generation for Chinese voice search
نویسندگان
چکیده
Long named entities are often abbreviated in oral Chinese language, and this usually leads to out-of-vocabulary(OOV) problems in speech recognition applications. The generation of Chinese abbreviations is much more complex than English abbreviations, most of which are acronyms and truncations. In this paper, we propose a new method for automatically generating abbreviations for Chinese named entities and we perform vocabulary expansion using output of the abbreviation model for voice search. In our abbreviation modeling, we convert the abbreviation generation problem into a tagging problem and use the conditional random field (CRF) as the tagging tool. In the vocabulary expansion, considering the multiple abbreviation problem and limited coverage of top-1 abbreviation candidate, we add top-10 candidates into the vocabulary. In our experiments, for the abbreviation modeling, we achieved the top-10 coverage of 88.3% by the proposed method; for the voice search, we improved the voice search accuracy from 16.9% to 79.2% by incorporating the top-10 abbreviation candidates to vocabulary.
منابع مشابه
Automatic Chinese Abbreviation Generation Using Conditional Random Field
Boulder, Colorado, June 2009. c ©2009 Association for Computational Linguistics Automatic Chinese Abbreviation Generation Using Conditional Random Field Dong Yang, Yi-cheng Pan, and Sadaoki Furui Department of Computer Science Tokyo Institute of Technology Tokyo 152-8552 Japan {raymond,thomas,furui}@furui.cs.titech.ac.jp Abstract This paper presents a new method for automatically generating abb...
متن کاملCluster based Chinese abbreviation modeling
Abbreviations in Chinese are widely observed in Chinese spoken language. Automatic generation of Chinese abbreviations helps to improve Chinese natural language understanding systems and Chinese search engine. The abbreviation generation is treated as a character-based tagging problem. Due to limited training data, Chinese abbreviation generation suffers from data sparseness. Two types of strat...
متن کاملConstructing Chinese Abbreviation Dictionary: A Stacked Approach
Abbreviation is a common linguistic phenomenon with wide popularity and high rate of growth. Correctly linking full forms to their abbreviations will be helpful in many applications. For example, it can improve the recall of information retrieval systems. An intuition to solve this is to build an abbreviation dictionary in advance. This paper investigates an automatic abbreviation generation me...
متن کاملIMPROVING AUTOMATIC ABBREVIATION EXPANSION WITHIN SOURCE CODE TO AID IN PROGRAM SEARCH TOOLS by
Software maintenance is an important part of the software lifecycle. Understanding large software systems that are unfamiliar can be difficult for maintenance programmers. Intelligent and robust search tools are one method for facilitating program understanding and comprehension. One of the major problems associated with improving search tools is the use of abbreviations within software. The fo...
متن کاملOptimizing Cost Function in Imperialist Competitive Algorithm for Path Coverage Problem in Software Testing
Search-based optimization methods have been used for software engineering activities such as software testing. In the field of software testing, search-based test data generation refers to application of meta-heuristic optimization methods to generate test data that cover the code space of a program. Automatic test data generation that can cover all the paths of software is known as a major cha...
متن کامل